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Abstract An interesting classical result due to Jackson allows polynomial-time 

learning of the function class DNF using membership queries. Since in most 
practical learning situations access to a membership oracle is unrealistic, this paper 
explores the possibility that quantum computation might allow a learning algorithm 
for DNF that relies only on example queries. A natural extension of Fourier-based 
learning into the quantum domain is presented. The algorithm requires only an 
example oracle, and it runs in 0( V2" ) time, a result that appears to be classically 
impossible. The algorithm is unique among quantum algorithms in that it does not 
assume a priori knowledge of a function and does not operate on a superposition 
that includes all possible basis states. 



1. Introduction 

The field of computational learning theory (COLT) attempts to formalize the process of 
inductive learning and to develop bounds for classes of functions that are learnable in some formal 
sense. One particular function class that has received a great deal of attention is the class of binary 
functions that are representable in disjunctive normal form (DNF). This function class is very 
expressive and has resisted attempts to produce an algorithm for learning it under the rigorous 
PAC-leaming model often used in computational learning theory. Recently, however, Jackson has 
produced an algorithm that learns DNF under the uniform distribution in polynomial time with 
access to a membership oracle (these terms will be defined shortly). This is an impressive 
theoretical result that relies on two key assumptions: uniform distribution of the function examples 
and access to a membership oracle. While the assumption of a uniform distribution is often 
reasonable in practice, access to a membership oracle usually is not. 

The field of quantum computation (QC) investigates the power of the unique characteristics 
of quantum systems used as computational machines. Early quantum computational successes 
have been impressive, yielding results that are classically impossible [Sho97] [Gro96] [Hog96] 
[Ter98]. Thus, in an attempt to extend Jackson's result so that it does not rely on a membership 
oracle, this paper presents a Fourier-based quantum learning algorithm. Specifically, the algorithm 
is designed to find the large Fourier coefficients of boolean functions in polynomial time with 
access to only a classical example oracle. The algorithm is not completely successful because it 
fails to guarantee learning in polynomial time. However, it can be shown to learn in time 0{4^)- 
In contrast, classically estimating the Fourier coefficients requires 0(2") time. Therefore, although 
we have failed to accomplish the ultimate goal of polynomial-time learning using only an example 



oracle, we have nevertheless discovered a result that is both theoretically interesting (since it 
accomplishes something that seems classically impossible) and to some extent useful in a practical 
sense (since it extends the size of problems to which we can practically apply a Fourier-based 
learning scheme. We have, in essence, traded Jackson's impressive theoretical polynomial time 
bound for a practical assumption and a more modest time bound. It is also interesting to note that 
Grover's well-known quantum searching algorithm also provides the same 0( improvement 
over its classical equivalent [Gro96]. 

Section 2 provides a simplified overview of computational learning theory and briefly 
discusses work in COLT related to our results. Section 3 introduces quantum computation and 
some of its basic ideas and early successes. Since a familiarity with basic ideas from 
computational learning theory is assumed, only a few necessary remarks on the subject are 
provided here; quantum computation, on the other hand, is treated more carefully as it is likely that 
readers will be less familiar with it. Since neither of these subjects can be properly covered here, 
references for further study are also provided. Section 4 discusses Fourier-based learning methods 
in some detail. A Fourier-based, inductive quantum learning algorithm that only requires access to 
an example oracle is presented in section 5. Section 5 also discusses why the algorithm will not 
learn the function class DNF in polynomial time and then shows that it will learn in Oi'Jl") time. 
The paper concludes with final remarks and directions for further research in section 6. 

2. Computational Learning Theory 

A rigorous approach to machine learning is traditionally traced back to Valiant [Val84], and 
the resulting computational learning theory has provided a formal basis for machine learning. In 
particular, the PAC (Probably Approximately Correct) model has yielded some nice theoretical 
results proving leamability of various function classes. Under this model an example oracle E for 
a function /may be queried for a random example of the form x ^f(x) from/ according to a 
distribution D that governs the frequency of the examples (basically this is equivalent to having 
access to a "large enough" training set). A function class F is termed strongly PAC-leamable if 
there exists an algorithm A such that for any/e F and for any D, A produces a hypothesis h such 
that F(\h(x)-f(x)\ > £) < 5, and A requires at most m (where m is polynomial in the size of the 
input) queries of E and runs in time polynomial in m, 1/5 and 1/e. Here e is called the error and 5 
the confidence. In other words, a function class is strongly PAC-leamable if there exists a learning 
algorithm that with high probability will produce a relatively accurate hypothesis in a polynomial 
amount of time. Another type of learning that may be defined is weak learning. A function class F 
is weakly PAC-leamable if F is PAC-leamable for e = 1/2 - l/p{n,r), where p is a fixed polynomial 
and r is the size off. That is, a function class is weakly PAC-learnable if there exists a learning 
algorithm that with high probability will produce a hypothesis that is at least slightly more accurate 
than random guessing. 

However, the class DNF (disjunctive normal form) in which functions are expressed as a 
disjunction of clauses, each of which consists of a conjunction of literals (binary variables or their 
negation), has resisted efforts to develop provably efficient algorithms for its learnability. This is 
unfortunate because DNF is an extremely expressive class of functions, and it would therefore be 
significant to prove its leamability. Until recently, DNF as a general function class has resisted all 
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attempts to find an algorithm that will learn it in the PAC sense; however, much work has been 
done regarding its learnability, and some encouraging results have been obtained for restricted 
subclasses of DNF [Aiz91] [Blu92] [Bsh93] [Kea87]. Currently, a particularly promising 
approach to the unrestricted class DNF is the use of discrete Fourier analysis in the development of 
machine learning algorithms, including work by Goldreich and Levin [Gol89], Kushilevitz and 
Mansour [Kus93], and Jackson [Jac94]. Jackson's work has yielded the most encouraging results 
so far concerning DNF, as he is able to produce the first positive learning results for the class of 
unrestricted DNF. Jackson's basic approach is to first note that binary functions may be 
represented as a discrete Fourier expansion and that all DNF functions may be at least weakly 
approximated using a single Fourier basis function whose coefficient is "large." His idea is to find 
those large coefficients by approximating the function using examples and to combine them in such 
a way as to produce a good hypothesis for the function. Jackson's learning algorithm, which he 
calls the Harmonic Sieve, combines techniques from discrete Fourier analysis due to Goldreich and 
Levin [Gol89] and Kushilevitz and Mansour [Kus93] with the boosting ideas of Shapire [Sha90] 
and Freund [Fre90] [Fre92] . The Harmonic Sieve guarantees strong learnability of DNF with two 
caveats: first, the function distribution D must be uniform (actually this is relaxed somewhat, but it 
is still restrictive) and second, access to a membership oracle M rather than to an example oracle E 
is required. A membership oracle is like a black box version of the function to be learned, so that 
while an example oracle simply returns a random example upon being queried, a membership 
oracle can be queried for specific examples. In other words, when queried with x, the membership 
oracle must return /(xj. This characteristic makes a membership oracle strictly more powerful than 
an example oracle, and the Harmonic Sieve's dependence upon such an oracle, along with its 
restrictions on the distribution D, eliminates it from consideration as an algorithm for learning DNF 
in the PAC sense. Further, although Jackson's results are extremely impressive from a theoretical 
standpoint, they will likely yield very few practical results because, in general, access to a 
membership oracle is not realistic. On the other hand, access to an example oracle is much more 
realistic as this is basically equivalent to having access to a large training set. 

Bshouty and Jackson [Bsh95] realized this and investigated using quantum computation to 
improve Jackson's work, just as is proposed here. The result was an extension to the Harmonic 
Sieve that depended upon a quantum example oracle rather than upon a classical membership 
oracle. Since the quantum example oracle is strictly less powerful than the classical membership 
oracle, this is a positive theoretical result. However, as the authors point out, it is unclear how to 
construct a quantum example oracle (other than perhaps by utilizing a classical membership oracle), 
and therefore their approach is again not useful in a practical sense. In contrast, the algorithm 
presented here goes one step further by depending only upon a classical example oracle, thus 
providing the possibility of practical algorithms for learning the class DNF. 

3. Quantum Computation 

Quantum computation is based upon physical principles from the theory of quantum 
mechanics (QM), which is in many ways counterintuitive. Yet it has provided us with perhaps the 
most accurate physical theory (in terms of predicting experimental results) ever devised by science. 
The theory is well-established and is covered in its basic form by many textbooks (see for example 
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[Fey65]). Several necessary ideas that form the basis for the study of quantum computation are 
briefly reviewed here. 

3.1 Linear Superposition 

Linear superposition is closely related to the familiar mathematical principle of linear 
combination of vectors. Quantum systems are described by a wave function l/Athat exists in a 
Hilbert space [You88]. The Hilbert space has a set of states, |^,), that form a basis, and the 
system is described by a quantum state | i/a) , 

\¥) = Lci\(l>i). (1) 

i 

I y/) is said to be in a linear superposition of the basis states |^, ), and in the general case, the 
coefficients c/ may be complex. Use is made here of the Dirac bracket notation, where the ket |-) is 
analogous to a column vector, and the bra {•] is analogous to the complex conjugate transpose of 
the ket. In quantum mechanics the Hilbert space and its basis have a physical interpretation, and 
this leads directly to perhaps the most counterintuitive aspect of the theory. The counter intuition is 
this — at the microscopic or quantum level, the state of the system is described by the wave 
function y/, that is, as a linear superposition of all basis states (i.e. in some sense the system is in 
all basis states at once). However, at the macroscopic or classical level the system can be in only a 
single basis state. For example, at the quantum level an electron can be in a superposition of many 
different energies; however, in the classical realm this obviously cannot be. 

3.2 Coherence and decoherence 

Coherence and decoherence are closely related to the idea of linear superposition. A 
quantum system is said to be coherent if it is in a linear superposition of its basis states. A result of 
quantum mechanics is that if a system that is in a linear superposition of states interacts in any way 
with its environment, the superposition is destroyed. This loss of coherence is called decoherence 
and is governed by the wave function yr. The coefficients are called probability amplitudes, and 
|c, | gives the probability of | i/a) collapsing into state |^( ) if it decoheres. Note that the wave 
function yf describes a real physical system that must collapse to exactly one basis state. 
Therefore, the probabilities governed by the amplitudes must sum to unity. This necessary 
constraint is expressed as the unitarity condition 

i:K-|' = i. (2) 

In the Dirac notation, the probability that a quantum state | yr) will collapse into an eigenstate 1 0, ) 
is written | i/A^j and is analogous to the dot product (projection) of two vectors. Consider, for 
example, a discrete physical variable called spin. The simplest spin system is a two-state system, 
called a spin- 1/2 system, whose basis states are usually represented as (spin up) and (spin 
down). In this simple system the wave function i/^is a distribution over two values (up and down) 
and a coherent state | i/a) is a linear superposition of T ) and V) . One such state might be 
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As long as the system maintains its quantum coherence it cannot be said to be either spin up or spin 
down. It is in some sense both at once. Classically, of course, it must be one or the other, and 
when this system decoheres the result is, for example, the t) state with probability 




A simple two-state quantum system, such as the spin- 1/2 system just introduced, is used as 
the basic unit of quantum computation. Such a system is referred to as a quantum bit or qubit, and 
renaming the two states |0) and |1) it is easy to see why this is so. 

3.3 Operators 

Operators on a Hilbert space describe how one wave function is changed into another. 
Here they will be denoted by a capital letter with a hat, such as A, and they may be represented as 
matrices acting on vectors. Using operators, an eigenvalue equation can be written A| 0,) = ai\<pj), 
where a, is the eigenvalue. The solutions 1 0, ) to such an equation are called eigenstates and can be 
used to construct the basis of a Hilbert space as discussed in section 3.1. In the quantum 
formalism, all properties are represented as operators whose eigenstates are the basis for the 
Hilbert space associated with that property and whose eigenvalues are the quantum allowed values 
for that property. It is important to note that operators in quantum mechanics must be linear 
operators and further that they must be unitary so that A A = AA = /, where/ is the identity 
operator, and A ' is the complex conjugate transpose, or adjoint, of A . 

3.4 Interference 



Interference is a familiar wave phenomenon. Wave peaks that are in phase interfere 
constructively (magnify each other's amplitude) while those that are out of phase interfere 
destructively (decrease or eliminate each other's amplitude). This is a phenomenon common to all 
kinds of wave mechanics from water waves to optics. The well-known double slit experiment 
demonstrates empirically that at the quantum level interference also applies to the probability waves 
of quantum mechanics. As a simple example, suppose that the wave function described in (3) is 
represented in vector form as 



1 



f2\ 



and suppose that it is operated upon by an operator O described by the following matrix, 

/fii -1/ 



o 



(5) 



(6) 



The result is 



and therefore now 



oW) = 



1 


(\ 1 ^ 


1 


(2\ 


1 


(3\ 


V2 




a/5 




~ Vio 


.1. 



ly,)=4=iTu4=ii 



(7) 



(8) 



5 



Notice that the amplitude of the state has increased while the amplitude of the ^Vj state has 
decreased. This is due to the wave function interfering with itself through the action of the 
operator -- the different parts of the wave function interfere constructively or destructively 
according to their relative phases just like any other kind of wave. 

3.5 Entanglement 

Entanglement is the potential for quantum states to exhibit correlations that cannot be 
accounted for classically. From a computational standpoint, entanglement seems intuitive enough 
~ it is simply the fact that correlations can exist between different qubits ~ for example if one qubit 
is in the |1) state, another will be in the |1) state. However, from a physical standpoint, 
entanglement is little understood. The questions of what exactly it is and how it works are still not 
resolved. What makes it so powerful (and so little understood) is the fact that since quantum states 
exist as superpositions, these correlations somehow exist in superposition as well. When the 
superposition is destroyed, the proper correlation is somehow communicated between the qubits, 
and it is this "communication" that is the crux of entanglement. There are different degrees of 
entanglement and much work has been done on better understanding and quantifying it [Joz97] 
[Ved97]. It is interesting to note from a computational standpoint that quantum states that are 
superpositions of only basis states that are maximally far apart in terms of Hamming distance are 
those states with the greatest entanglement. For example, a superposition of only the states |00) 
and |1 1), which have a maximum Hamming spread, is maximally entangled. Finally, it should be 
mentioned that while interference is a quantum property that has a classical cousin, entanglement is 
a completely quantum phenomenon for which there is no classical analog. 

3.6 Quantum Algorithms 

The field of quantum computation, which applies ideas from quantum mechanics to the 
study of computation, was introduced in the mid 1980's [Fey 86] [Ben82]. For a readable 
introduction to quantum computation see [Bar96]; for a more rigorous treatment see for example 
[Deu85]. The field is still in its infancy and very theoretical but offers exciting possibilities for the 
field of computer science ~ the most important quantum algorithms discovered to date all perform 
tasks for which there are no classical equivalents. For example, Deutsch's algorithm [Deu92] is 
designed to solve the problem of identifying whether a binary function is constant (function values 
are either all 1 or all 0) or balanced (the function takes an equal number of and 1 values). 
Deutsch's algorithm accomplishes the task in order 0(1) time, while classical methods require 
0(2") time, where n is the number of binary inputs to the function. Simon's algorithm [Sim97] is 
constructed to solve the following promise problem. A function/: {0,1 }"—>{ 0,1 }" is guaranteed 
to be either a random 1-1 function or a periodic 2-1 function such that/(x) =J{x®s) for all x for 
some n-bit string s (where © denotes the bitwise exclusive OR). Here again an exponential 
speedup is achieved; however, admittedly, both these algorithms have been designed for artificial, 
somewhat contrived problems. Grover's algorithm [Gro961, on the other hand, provides a method 
for searching an unordered quantum database in time 0( ^| 2" ), compared to the classical lower 
bound of 0(2"). Here is a real- world problem for which quantum computation provides 
performance that is classically impossible (though the speedup is less dramatic than exponential). 



6 



Finally, the most well-known and perhaps the most important quantum algorithm discovered so far 
is Shor's algorithm for prime factorization [Sho97]. This algorithm finds the prime factors of very 
large numbers in polynomial time, whereas the best known classical algorithms require exponential 
time. The implications for the field of cryptography are profound because many cryptographic 
systems, including the well-known RSA system [Riv78], depend upon the problem of prime 
factorization requiring exponential time. These quantum algorithms take advantage of the unique 
features of quantum systems to provide significant speedup over classical approaches. 

It is worth mentioning that very recently several different groups have succeeded in 
physically realizing small-scale quantum computers and implementing some of the above 
mentioned algorithms with them [Jon98] [Chu98]. Also, work on quantum error correction has 
made impressive advances [Sho95] [Cor98] crucial to the construction of larger scale quantum 
computers. Therefore, although some formidable technological hurdles still exist, it is not 
unreasonable to suggest that quantum computational systems that perform nontrivial computation 
are much closer to realization than was thought possible even as recently as two or three years ago. 
In the mean time, it is important to develop a theory of quantum computation so that when the 
technology does become available, it may be exploited. Further, techniques and ideas that result 
from developing quantum algorithms may be useful in the development of new classical 
algorithms. Finally, the process of understanding and developing a theory of quantum 
computation provides insight and contributes to a furthering of our understanding and development 
of a general theory of computation. 



4. Fourier-Based Learning 

Here is given a brief description of Fourier-based learning as it applies to our approach. In 
what follows, the general Fourier-based learning approach used by Kushilevitz, Mansour, Jackson 
and others will for simplicity be referred to as KMJ. The subject of discrete Fourier analysis is 
well developed and is treated only very briefly here. For a more in depth presentation see any 
book on Fourier analysis, for example [Mor94]. 

4.1 Some ideas from Fourier analysis 

A bipolar-valued binary function/: {0,1}" {-1,1} can be represented as a Fourier 
expansion (the expansion used here is actually a simplified Fourier expansion called a Walsh 
transform) 

f(x)= I f(a)Xa(^)' (9) 

5e{0,l}" 

where the Fourier basis functions Xs^^^ defined as 

Xs(x) = (-lf' (10) 
and the Fourier coefficients being given by 

f(a) = ^ E mxAa). (11) 

2 ie{0,l}" 

Actually, in the general case (11) should use x* (complex conjugate of x)'^ however, in the 
simplified case considered here (bipolar rather than complex output), x* = X- The KMJ method 
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leams/in polynomial time by approximating a polynomial number of large Fourier coefficients in 
the context of a boosting algorithm due to Freund [Fre90] [Fre92]. In order to determine the set /? 
of large coefficients, the method requires access to a membership oracle. The large coefficients, 
f(a) (for a G /^) are then approximated by 

/(«) = - I/(x)Z3c(«)> (12) 

with Tbeing a set of m carefully chosen examples of the form x — > f(x) with x e {0,1}" and 
f(x) e {-1,1} . Finally, using the fact that the function can be at least weakly approximated with a 
polynomial number of large coefficients (the set /T) and using (12) to approximate those 
coefficients, the function/may be approximated by 

f(x)= LmXa(^). (13) 

We propose here a quantum algorithm that determines the set /fusing only an example oracle (i.e. 
a training set) rather than the membership oracle required by KM J. In other words, the KMJ 
approach requires the ability to choose which examples will be used to learn the function (which in 
typical learning problems we can not do); on the other hand, the method that is proposed here 
makes no such requirement — a standard training set suffices. 

4. 1 . 1 A Fourier example 



A simple example will help illustrate the concept of a Fourier expansion. Let n-2 and 

roo^ 1 



/ = 



01^ 1 
10^-1' 
11^ 1 

To calculate the Fourier basis functions use (10) and for example, 

;^oo(00) = -l ' =-1 =1 

and 

(o.)(;) 1 

Zoi(ll) = -l ' =-1 =-1. 
The other 14 values for the 4 Fourier functions are calculated similarly. Next, calculate the Fourier 
coefficients using (11). For example. 



That is. 



/(I l) = \ (/(OO)Zoo(l 1) + /(Ol)Zoi(l 1) + /(lO)Zio(l 1) + /(I l)Zii(l 1)) • 



/(1 1) = ^((1)(1) + (1)(-1) + (-1)(-1) + (1)(1)) = \- 



and the other coefficients are found in the same manner. Finally, the Fourier expansion of/can be 
written using (9), 

fix) = fmXooi^) + fm)Xoi(x) + fmxmix) + /(l l)Xn(x) . 
Which in this example evaluates to 
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fix) = ^Xoq(x) - ^Zoi(^) + ^Xm(^) + ^Xn(x)' 

and solving for any of the values 00, 01, 10, or 1 1 will result in the appropriate output for/. Now 
if instead of knowing/, we only have a training set such as 

fOQ^ 1 

?" = ■ 01^ 1, 

10^-1 

then the coefficients are approximated using (12) instead of (11). For example now 

/(1 1) = ^{fmxooa 1) + /(oi)zoi(i 1) + /(lo)zio(i 1)) > 

which simplifies to 

Aii)=^((i)(i)+(i)(-i)+(-i)(-i))=^. 

The Fourier expansion is now approximated using (13) instead of (9). 

fix) = fmxoo(x) + fmxoi(x) + fmxio(x) + fa i)xn(x) , 

which simplifies to 

113 1 

fix) = -Xooix) - -Xoiix) + -Xioix) + -Xnix)- 

Now solving for 00, 01, or 10 will give 4/3, 4/3, and -4/3 respectively. While these are not the 
correct values, they are definitely the correct sign. On the other hand, solving for 1 1 results in 0, 
which is equivalent to "don't know". 

4.2 Matrix formulation 

The first step is to reformulate the Fourier equations in matrix form. Note that the Xa 
be considered vectors in a 2"-dimensional space indexed by xe{0,l}". These Xa form an 
orthonormal basis for the function space with inner product 

fl if G' b 
otherwise 

Again, in the general case one of the X in (14) should be X*^ but for the special case of bipolar 
outputs X* = X- Now let B be the matrix formed by taking the Xa rows. Because of the 

orthonormality of the basis, the columns are also formed by the Xa • Also, define / as a vector in 
an 2''-dimensional space indexed by i e {0, 1}" such that 

/ic=/(^). (15) 

Then 

^^f = f (16) 
gives the Fourier coefficients as a vector in a 2"-dimensional space indexed by a g {0, 1}" so that 

fa=ha). _ (17) 
To evaluate (i), define y as the 2"- vector indexed by Z? g {0,1}" such that 
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(18) 



|l ifb = x 
[O otherwise 
and calculate 

By = X ^ (19) 
Now 2^ is a vector in a 2"-dimensional space indexed hy b & {0,1}", the ath element of which is 
equal to Xa^^)- Finally, the Fourier representation of/ is given as 

(20) 



m = {Byy{—Bf). 

Following the thinking of KMJ,/may be approximated by approximating the Fourier coefficients 
using a set of m examples, T. Define / as a 2"- vector indexed by x e {0,1}" such that 

fit=\ ■ (21) 

[0 otherwise 

Then the Fourier representation of the approximate function is 



f{x) = {By)H-Bf). 

m 



1 



(22) 



Now, using linear algebra gives 



fix) = (By)\-Bf) = -y'^B'^Bf = —y^f = 



m 



m 



m 



2" 



m 




fix) 



(23) 



ifx&T 
otherwise 

Recall that fix) = {-1,1 } and can never equal 0. Therefore, approximating /by approximating all 
2" Fourier coefficients results in memorization of the training set (up to the sign of the output) and 
no generalization whatsoever on new inputs. As a consequence of this, it is clear that the 
generalization inherent in Fourier-based learning methods is due at least in part to the fact that not 
all the Fourier coefficients are used in the approximation. 

4.2.1 A matrix version of the example 



For clarity, the example of section 4.1.1 is repeated here using the matrix formulation. 
According to (15) the function/is now written as a vector. 



f n 
1 

-1 

V ly 



The matrix B is 



B = 



n 
1 



1 

I -1 -1 

and using (16) the Fourier expansion of /is now written 



1 
-1 
1 



n 
-1 
-1 

1 



10 



4 4 



a 1 
1 -1 
1 1 



1 lY n 



1 -1 
-1 -1 



1 
-1 



f n 
-1 
1 



Now considering the case wliere tlie function is unknown but a training set is available, the 
approximate function represented by the training set is written as a vector using (21), 



and the approximate Fourier transform is ob 



:Bf = 



1 

-1 

ained from (22), 





fl 


1 1 


n 










n 


1 


1 - 


1 1 


-1 




1 


_ 1 




-1 


3 


1 


1 -1 


-1 




-1 


~ 3 




3 




.1 - 


1 -1 


h 


V 


0. 









Actually, (22) gives not just the approximate Fourier transform for/, but the method for calculating 
the individual values for the approximate expansion. For example, to calculate the approximate 
value for 1 1, use (18) to get 



Then by (22) 



f(x) = 



1 



1 

1-1 1-1 
1 1-1-1 



lYoV^ 





viyy 



1 1 
1 -1 
1 1 
1 -1 








1 
1 

-1 
-1 



n 
-1 
-1 
1 



1 
1 





=(1 -1 -1 1) 



yv "y 



f n 
-1 

3 

V ly 



= 0, 



just as in example 4.1.1 and as proved in (23). 
4.3 Quantum formulation 

The next step is to extend the vector representation into the quantum domain. Given a set 
of n (where n is the length of the binary input x tof) qubits whose basis states, |x) , correspond to 
all the different values for x , define 

l/>= E (24) 

5e{0,l}" 

where the amplitudes 

(25) 



c,=nx)={-i,i} 
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are properly normalized according to (2), as a quantum state of the n qubits that describes the 
function /. The domain of / is encoded in the state labels and the range of the function in the 
phases of the amplitudes. Similarly, given a set of examples T, define 

/)= Lc,\^), (26) 



xeT 

where again the amplitudes are defined as in (25). This quantum state of the n qubits describes the 
partial function / . Next, define the operator B as the matrix B and note that it acts on a quantum 
state, transforming it from the \x) basis to the Fourier basis, Xa- Now that transformation takes 
the form 



B\f)= f). 



or 



B 



f = 



(27) 



(28) 



slightly abusing the hat notation by using it to symbolize both an operator and the quantum state 
vector associated with the Fourier expansion off. Similar to (19) 

B\x) = \x) (29) 
results in a quantum state that contains in its amplitudes the values of all Fourier basis functions for 
the input X. Finally, 



Bx 



Bf 



(30) 



f(x) = 

represents the full function and 

f(x) = {Bx\Bf) (31) 

represents an approximation of the function from T. Notice that we are still dealing with quantum 

amplitudes, and therefore the result of the calculation is not directly available to us. As it turns out, 
the quantum system will not be used to calculate f(x) but only to determine the relative amplitudes 
of the Fourier coefficients ~ those coefficients that are large will have "large" amplitudes. 



4.3.1 A quantum example 



Finally, the example of section 4.1.1 is repeated using the quantum formulation. This 
example is particularly interesting in that it emphasizes the difference of this quantum formulation 
from the previous two, indicating that we can carry the analogy only so far. According to (24) the 
function/ is now written as a quantum superposition, 

|/> = i|00) + ||01)-i|10) + i|ll). 

The operator B is 
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and using (27) the Fourier expansion of/is now written 
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= i|00)-i|01) + i|10) + i|ll). 



Now considering the case where the function is unknown but a training set is available, the 
approximate function represented by the training set is written as a quantum superposition using 
(26), 

i/)=-;^ioo).-;^ioi)--L|io), 

and the approximate Fourier transform is obtained from (28), 
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Note that the amplitudes are not the exact Fourier coefficients now, but are instead proportional to 
them. This is because of the unique statistical nature of quantum systems. It is also important to 
mention a subtle point here about using quantum computation to approximate the Fourier 
coefficients. Because of the normalization condition (2) we can not use the appropriate 1/m 
normalization of equations (12) and (22). The result of this is that equation (31) is not really 
approximating the Fourier coefficients of the original 2-valued function /:{ 0, 1 }"—>{ -1,1 } but 
rather is computing the Fourier coefficients of a 3-valued function / : { 0, 1 } > {-1,0,1}, which has 
the same "large" coefficients as /, except that they are normalized by an exponential factor. 
Fortunately, we will not need to use the quantum system to approximate the large coefficients but 
only to indicate which are the large coefficients. Still, this important difference will affect the 
efficiency of the algorithm, and we will address the problem it creates in section 5.1. Finally, 
using (30) and (31) gives us the ability to calculate function values and approximate function 
values, respectively. For example, to calculate the value for the binary string 11, use (30) to get 



f(x) = {Bx\Bf)^^{l 



and use (31) to get 



f(x) = {Bx\Bf) = ^{l 



-1 -1 1)- 



n 
-1 
1 



-1 -1 1) 



2V3 



= 1, 



-1 
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= 



just as in the previous examples and as proved in (23). We might also mention here that although 
mathematically these equations yield the correct values, from a physical standpoint, they are 
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impossible to implement. Fortunately again, we do not need to be able to implement (30) or (31), 
but instead just need to be able to implement (28), which is physically realizable. 

In the KMJ algorithm, the large coefficients are approximated using the instances in 7"as in 
(12) but recall that deciding which are the large coefficients first requires a membership oracle. 
The goal of the quantum formulation of the approximate Fourier expansion in (31) is to determine 
which are the large coefficients using only the training set Tinstead of using a membership oracle. 
The approach is to construct and observe the quantum system of equation (28). The intuitive idea 
is that since the large Fourier coefficients will be represented in this quantum state as amplitudes 
with "large" magnitude, they will have a "large" probability of being observed. Since observing 
the system will cause it to collapse to a single basis state corresponding to a single Fourier basis 
function, the process must be repeated and statistics kept for each basis state until a statistically 
significant measure of the relative amplitudes of the coefficients is achieved. How many times this 
observation process must be repeated to find a large coefficient will depend on how many large 
coefficients there are and on their relative magnitudes. This turns out to be the catch. As 
mentioned in 4.3.1, because of the nature of quantum systems, it is impossible to directly calculate 
equation (12). Instead, when using a quantum system the results of (12) are in essence multiplied 
by a normalization constant. 



5. Finding a Large Coefficient by Quantum Fourier Sampling 

Blum et al. have shown that under the uniform distribution, there exists a Fourier basis 
function that weakly approximates any DNF function /[Blu94]. In other words, there exists a 
Fourier basis function, fy^, that agrees with/ on at least 1-e inputs, where 

e = - ^— (32) 

2 p(n,s) 

with p being some fixed polynomial, n being the number of inputs to/, and s being the size of the 
representation of /. This implies that the magnitude of the Fourier coefficient for the weak 
approximator is 

f^=0(-^), (33) 
p(n,s) 

and using Chernoff bounds [Hag89] this can be closely approximated using a polynomial number 
m of examples drawn randomly from an example oracle, if we know which of the 2" basis 
functions it is. In other words, since this coefficient is relatively large, if we had some way of 
sampling the Fourier coefficient distribution, it should be readily identifiable with a polynomial 
number of samples. Following this line of thinking, the results developed in section 4 suggest a 
simple algorithm for finding this large Fourier coefficient which is presented in figure 1. 
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Repeat until confidence > 5 
form 



fj, a quantum superposition representing the training set 
compute the Fourier transform on the superposition using the operator B 
observe the system to produce a random sample of the coefficient distribution 
Approximate the most commonly observed coefficient classically 



1 

H = 



(34) 



Figure 1. Algorithm for finding large Fourier coefficients 

There are four ingredients for the algorithm, the first three of which are performed on a quantum 
computer while the fourth is performed on a classical computer . In order for the algorithm to run 
in polynomial time, all of them must be shown to be polynomial. Specifically, we must address: 
1) The construction of the state 2) The implementation of the operator 5 ; 3) The number of 
times this process must be repeated in order to identify a large coeffient; and 4) The classical 
approximation of that coefficient. Constructing the state is nontrivial and a method for doing 
so in 0(mn) time is detailed in [Ven98]. Implementing B turns out to be extremely easy on a 
quantum computer, and it is in fact the basis of most quantum algorithms discovered to date. 
Computing the Walsh transform of a quantum state is accomplished simply by applying the 
elementary quantum operator 

"1 1 ' 
1 -1_ 

to each qubit in parallel (see for example [Gro96]). In other words, B = H®- ■•®H , where ® is 
the tensor product (direct matrix product) and H appears in the product n times, one for each 
input. Approximating a coefficient is easy on a classical computer and can be accomplished using 
equation (12) and a polynomial number of function examples. 

5.1 A Quantum Quirk in Estimating Coefficients 

This leaves the question of how many times we must repeat the Fourier sampling process 
in order to find, with high probability, a large coefficient, and in fact, this is a difficult point. To 
see why, consider the following. The unitarity condition of equation (2) requires that the sum of 
the squares of the coefficients in a quantum state be equal to 1. For convenience we reproduce the 
equation here, recalling that the quantum coefficients are within a multiplicative constant of being 
equal to the Fourier coefficients /(a). 

L\cif = L (35) 

i 

As mentioned in 4.3.1, what the Fourier sampling algorithm is really doing is not approximating 
the Fourier coefficients of /:{0,1}" — > {-1,1} but rather is calculating the Fourier coefficients of 
/:{0,1}" {-1,0,1}. Another way to look at this is to say that the Fourier sampling algorithm is 
approximating the coefficients of /:{0,1}'' {-1,1} but with a normalization constant of l/ V2^ 
rather than the 1/m required by equation (12). (Either viewpoint will produce the same result.) 
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Because of this, in contrast to the nice polynomial sized Fourier coefficient given in (33), the size 
of the corresponding quantum coefficient c^/, is 



p(n,s) 



= O , (36) 



and its square is thus 



(Cw)2=0 



_m/r_ 

p^(n,s) 



= ^ . (37, 



Therefore equations (35-37) require that the number C of nonzero coefficients be at least 



1 = C 



O 



C = — . (38) 



Also, the smallest possible quantum coefficient, cs is 

1 



(39) 



Therefore, if the number of examples m is significantly less than 2", which will be the case if we 
require a polynomial-time algorithm, then there are an exponential number of nonzero coefficients 
in the quantum state with the largest only a factor of ^^m larger than the smallest. (When the 
system is observed, this difference is magnified to a factor of m; however this is not significant in 
the end result.) This means that 0(2") samples of the amplitude distribution (obtained by 
observing the quantum system) will be required to discriminate the "large" coefficients from the 
"small" ones. 

Another way to look at this is to say that the vector representing the function is very sparse. 
As a result, it correlates well with almost every Fourier basis function and therefore, the "large" 
coefficients are not very much larger than the "small" ones. In fact, as mentioned above, this is 
unavoidable in a quantum system because of the requirement that all operators be unitary. 
Therefore, the quantum system is attempting to calculate information about an exponential number 
of Fourier coefficients using only a polynomial number of function examples. Because of the 
unitarity requirement this polynomial amount of information gets normalized by an exponential 
instead of the polynomial necessary for closely approximating the coefficients as in (12). Because 
of the normalization inherent in unitary quantum processes, it is unlikely that a quantum algorithm 
can be developed to overcome this problem. However, this is certainly not a proof that it is 
impossible. 

There is an equally valid classical analysis based upon Parseval's identity that will yield the 
same result, though some of the intermediate results are different by 0(aA). 



5.2 A NonClassical Result -- Finding Large Coefficients in 0(^/2") Time 

If the quantum algorithm of figure 1 requires exponential time to identify the large quantum 
coefficients, it is certainly no better than a classical algorithm that simply approximates each 
Fourier coefficient using equation (12) and thus also requires 0(2") time. However, the algorithm 
actually requires only C)( V2" ) steps in order to identify the large quantum coefficients and thus 
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still represents a significant improvement over the classical approach. To see why, choose the 
number of examples m drawn from an example oracle to be 

m=^. (40) 
Then by equation (36) the quantum coefficient cw which corresponds to the Fourier function that 
weakly approximates /is 



mil" 



p{n,s) 

and using equation (37) its square is 

m/2 



= 



2" T 



p(n,s) 



= 
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(cw)2 = 6> 
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,n/4 
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Using equation (38) the number C of nonzero coefficients is now at least 



1 = C 



O 



1 



2«/4 



C=0 



' 1 ^ 



(41) 



(42) 



(43) 



Since the smallest possible quantum coefficient cs is still give by equation (39), we now have a 
situation in which there are still an exponential number of nonzero coefficients in the quantum 

state, but now the largest is a factor of 2"^'* larger than the smallest. When the system is observed, 
this difference is magnified to a factor of ^^2F , and therefore 0( a/^) samples of the amplitude 
distribution will suffice to identify the large coefficient. 



5.3 Analysis and discussion 

Using some concrete numbers, assume that n = 30 and that therefore m = = 2^^ = 
32768, numbers which can represent a very non-trivial learning task. Then the algorithm requires 
0(2^^) < 10^ operations. For comparison, [Bar96] gives estimates of how many operations might 
be performed before decoherence for various possible physical implementation technologies for the 
qubit. These estimates range from as low as 10^ (electron GaAs and electron quantum dots) to as 
high as 10^^ (trapped ions), so our example falls comfortably into this range, even near the low 
end of it. Further, the algorithm would require only 61 qubits. (Although it appears that the 
algorithm presented here requires only n qubits, the algorithm depends on a method for 
representing the training set as a quantum state. As mentioned before, an explicit algorithm for 
constructing such a quantum state exists [Ven98]; however it requires 2n+l qubits.) 

In contrast, Shor's algorithm requires hundreds or thousands of qubits to perform an 
interesting factorization. For example, [Ved96] gives estimates for the number of qubits needed 
for modular exponentiation, which dominates Shor's algorithm ~ anywhere from 7n-l-l down to 
4n+3. For a 512 bit number (which RSA actually claims may not be large enough to be safe 
anymore, even classically), this translates into anywhere from 3585 down to 2051 qubits. As for 
elementary operations, they claim 0(n ), which would be in this case (9(10 ). Therefore, the 
algorithm presented here requires orders of magnitude fewer operations and qubits than Shor's in 
order to perform significant computational tasks. This is an important result since quantum 
computational technology is still immature, and maintaining and manipulating the coherent 
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superposition of a quantum system of 60 qubits should be attainable sooner than doing so for a 
system of 2000 qubits. In this sense, the algorithm compares nicely with other quantum 
algorithms. 

As mentioned earlier, the algorithm also compares favorably with classical approaches to 
finding large coefficients. Jackson's Harmonic Sieve runs in polynomial time; however, as 
discussed in section 2 this algorithm requires a membership oracle, or black box access to the 
function /that we are trying to learn, in order to find the large coefficients. In cases where this is 
not realistic, the algorithm presented here provides an alternative that requires only an example 
oracle. There are no known classical algorithms for finding large coefficients that use only an 
example oracle and run any faster than 0(2"), so again the algorithm presented here represents a 
significant improvement. 

6. Concluding Comments 

This paper presents a quantum computational learning algorithm that takes advantage of the 
unique capabilities of quantum computation to produce a significant result in both the field of 
computational learning and the field of quantum computation. The main result of the paper is a 
quantum computational learning algorithm for learning the function class DNF over the uniform 
distribution using only an example oracle in time significantly faster than any known classical 
algorithm operating under the same constraints. In other words, the paper makes an important 
contribution to both the field of computational learning theory and to the field of quantum 
computation — producing both a new learning theoretic result and a new quantum algorithm that 
accomplishes something that no classical algorithm has been able to do. Further, the algorithm is 
unique among quantum computational algorithms as it does not assume a priori knowledge of a 
function/and does not operate on a quantum supersposition that includes all possible basis states. 

Further, the paper introduces a promising new field to which quantum computation may be 
applied to advantage — that of computational learning. In fact, it is the authors' opinion that this 
application of quantum computation will, in general, demonstrate much greater returns than its 
application to more traditional computational tasks (though Shor's algorithm is an obvious 
exception). We make this conjecture because results in both quantum computation and 
computational learning are by nature probabilistic and inexact, whereas most traditional 
computational tasks require precise and deterministic outcomes. 

One obvious area for future work is, of course, the physical implementation of the 
algorithm in a real quantum system. As mentioned in sections 3.6 and 5.3, the fact that this 
algorithm requires very few qubits for non-trivial problems combined with the recent advances in 
quantum technology suggests that the realization of quantum computers performing useful 
computation may be possible in the near future. A second important topic for future work is to 
determine whether or not the improvement obtained here over classical methods is optimal. If it is 
not, future work may produce further gains. In the mean time, a simulation of the quantum 
algorithm is being developed to run on a classical computer at the cost of an exponential slowdown 
in the size of the learning problem. Thus, learning problems that are non-trivial and yet small in 
size will provide interesting study in simulation. Various methods of using and combining the 
large Fourier coefficients are currently being investigated. One method, of course, is to use the 
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boosting method that Jackson uses in his Harmonic Sieve, and we are considering other methods 
of combination as well. We are also investigating using higher spin systems for learning 
nonbinary problems. Another obvious and important area for future research is investigating 
further the application of quantum computational ideas to the fields of computational learning 
theory and machine learning ~ the discovery of other quantum computational learning algorithms. 

As a final comment, Grover's well-known algorithm for quantum search also provides a 
0(V2^) improvement over classical approaches to the same problem. Grover's result is 
particularly important because in the case of searching an unordered database, the 0(2") bound for 
classical approaches is tight — there is no classical way to do better. In the case of learning DNF 
using only an example oracle, it is still an open problem whether or not a classical algorithm exists 
that is better than 0(2"); however, it is generally suspected that no such algorithm exists. This 
provides fuel for speculation that perhaps there exists a quantum information theoretic law that 
bounds the improvement of quantum algorithms over classical algorithms by 0( 4^)- According 
to Shor's paper, the best classical algorithm for prime factorization of an integer that can be 
represented in n bits runs in time 0{g), where g = e«'"(iog«)'" [Len93]. Since g = 0(^2") but g ^ 
0(a/2^), Shor's algorithm in some sense does not achieve as much of a speedup as do Grover's 
algorithm and the algorithm presented here. 
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